In [1]:
from scipy.stats import bernoulli, binom, uniform
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
%matplotlib inline
We have a coin. We don't know whether it is biased or not (i.e. the probability of one side is greater than 0.5, and the probability of the other side is less than 0.5). How do we figure this out?
We can flip the coin many times, and measure the number of times heads comes up out of all flips.
This is a power calculation: How many times would we need to flip the coin in order to conclude that it is indeed rigged?
We can compare our measured proportion of heads to a known model of randomness.
This is hypothesis testing: How probable is it that we would have observed the number of flips in our coin, assuming that the coin was unbiased?
In [2]:
p_hidden = 1 / np.log10(np.log10(1 - (1 - np.exp(135))))
In [3]:
n_flips = 1000
p = 0.5
bernoulli.rvs(p=p, size=n_flips)
Out[3]:
In [10]:
n_flips = 1000
p = 0.5
size = 1
binom.rvs(n_flips, p, size=size)
Out[10]:
In the previous cell, when we run it once, we draw one instance of 1000 Bernoulli trials, each with a probability p of turning heads. The 1000 Bernoulli trials can be thought of as one Binomial trial.
Each time round, the number of heads drawn from 1000 Bernoulli trials (or 1 Binomial trial) changes. Each time we draw 1 Binomial trial, we draw a sample from Binomial(n=1000, p=0.5). What would the distribution of the number of heads be from 1000 Binomial trials?
In [11]:
n_flips = 1000
p = 0.5
size = 1000
proportions = np.zeros(size) # initialize an array to store the data.
for i, n_heads in enumerate(binom.rvs(n_flips, p, size=size)):
proportions[i] = n_heads
plt.hist(proportions)
Out[11]:
In [26]:
# Flip the coin 50 times
n_flips = 50
pilot_results = bernoulli.rvs(p=p_hidden, size=n_flips)
# Compute an estimated value of p_hidden
p_hat = np.sum(pilot_results) / len(pilot_results)
p_hat
Out[26]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
We have estimated p_hidden by conducting a pilot experiment of one Binomial(n=50, p=p_hidden). The estimated value of p_hidden is p_hat. Let's assume p_hat is correct for a moment.
What is the necessary sample size for us to have such that we can be confident that we won't:
The place where most of our intuition fails is about here. We have to be comfortable with being wrong simply by chance. The only question is how comfortable are we being wrong?
In [7]:
alpha = 0.05
beta = 0.05
In [8]:
p_hidden_est = binom.rvs(n=1000, p=p_hidden, size=50)
plt.hist(p_hidden_est)
plt.hist(proportions)
Out[8]:
In [ ]:
In [ ]: